Open
Conversation
aknayar
commented
Apr 4, 2026
| } | ||
|
|
||
| float lower_bound = exact_distances[idx] - cauchy_schwarz_bound; | ||
| if constexpr (C::is_max) { |
Contributor
Author
There was a problem hiding this comment.
Unfortunately C::cmp() kills autovectorization here so we resort to this workaround.
aknayar
commented
Apr 4, 2026
| write_ivf_header(ivfp, f); | ||
| WRITE1(ivfp->n_levels); | ||
| WRITE1(ivfp->batch_size); | ||
| if (ivfp->batch_size == Panorama::kDefaultBatchSize) { |
Contributor
Author
There was a problem hiding this comment.
For backward compatibility.
aknayar
commented
Apr 4, 2026
| * accelerating the refinement stage. | ||
| */ | ||
| struct Panorama { | ||
| static constexpr size_t kDefaultBatchSize = 128; |
Contributor
Author
There was a problem hiding this comment.
I'm considering defining kLegacyDefaultBatchSize = 128 and kDefaultBatchSize = 1024 to update the default and have a fallback for the old indexes which were created with 128. Is such a change in default behavior allowed (IVF128,FlatPanorama8 would then silently use 1024 batch_size instead of 128)?
aknayar
commented
Apr 4, 2026
| } | ||
|
|
||
| template <typename Lambda> | ||
| inline auto with_bool(bool value, Lambda&& fn) { |
Contributor
Author
There was a problem hiding this comment.
I'm curious if there's a more appropriate location to define this.
aknayar
commented
Apr 4, 2026
| # All modern CPUs support F, CD, VL, DQ, BW extensions. | ||
| # Ref: https://en.wikipedia.org/wiki/AVX512 | ||
| target_compile_options(faiss_avx512 PRIVATE $<$<COMPILE_LANGUAGE:CXX>:-mavx2 -mfma -mf16c -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mpopcnt>) | ||
| target_compile_options(faiss_avx512 PRIVATE $<$<COMPILE_LANGUAGE:CXX>:-mavx2 -mfma -mf16c -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mpopcnt ${FAISS_BMI2_FLAGS}>) |
Contributor
Author
There was a problem hiding this comment.
Will have to add this to avx512_spr as well once #5034 goes in.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: Should be merged before #4970 (IVFPQPanorama).
Changes
Performance
This PR implements various optimizations to Panorama (L2Flat and IVFFlat).
if constexpr (C::is_max)instead ofC::cmpfor autovectorized pruning._pext_u64.active_indicesindirection) to let it autovectorize.IndexFlat/IVFFlatScannerPanorama.batch_sizeas a parameter for IVFFlatPanorama (for consistency withIndexFlatPanoramabut also because 1024batch_sizecan improve performance).Other
kDefaultBatchSizeonce inPanorama.h(previously defined in 5 separate locations).bench_flat_l2_panorama.pyandbench_ivf_flat_panorama.pyto acceptgist1Morsift1Mas dataset to bench on.Results
Together, these optimizations enable powerful additional speedups, especially on lower-dimensional datasets like SIFT (128d), by dramatically minimizing Panorama's overhead:
GIST1M (IVF128, nlist=128, nlevels=16)
SIFT1M (IVF128, nlist=128, nlevels=8)
Raw Data
Collected by running the new benches on
mainand this branch. On main, you cannot specifybatch_sizeso remove the{1024}from the factory string in the new benches to run them there. The results above are calculated from the following raw data as follows:nprobe((original ms per query) / (pano ms per query))Before (
main)GIST1M:
SIFT1M:
After (
optimize-pano)GIST1M:
SIFT1M: